A user emailed me today about a problem with sfc0.x.
---
+ rm -rf sfc0.x
+ ln -fs /usr/local/G-RSM/gsm/bin/sfc0.x sfc0.x
+ ./sfc0.x
PGFIO-F-209/unformatted read/unit=19/'OLD' specified for file which does
not exist.
In source file naopen.f, at line number 40
---
sfc0 reads a lot of files from the library, as unit 19. I suspect sfc0 can't find one of the input files. Please look at sfc0.out and which file is causing the error. (I posted my sfc0.out on Wiki for your reference)
I thought it was wiser to open a new thread of discussion, especially because sfc0.x now works correctly.
I printed the 'status' array and by changing in the routine nainit.f the value of status in which isize is defined, as you told me to do, the executable sfc.x works and terminates correctly.
but... now I have several problems with fcst.x.
First of all for 64 bit machines, "gm" libraries are linked and requested. They are NOT easy to obtain, and also since we have ETHERNET and not MIRYNET, in our opinion, they should be useless.
Anyway we did obtain these libraries and installed them, but the execution of fcst.x stalls almost immediately. I don't know if these libraries have anything to do with it, I have tried both linking them, and both modifying the makefiles so they are not linked, but the program stalls in the exact same way.
I will copy the file fcstout.ft00:
running /usr/local/G-RSM/runs/g_000/fcst.x on 8 LINUX ch_p4 processors
Created /usr/local/G-RSM/runs/g_000/PI2609
0getcon 62 28created april 92
0begin setsig - getting sigs from unit 11
1800. 0.92 0.80E+16 0.60E+16
reduce grid is on with 1 digit accuracy.
archv data from da,mo,yr= 0 3 66
...last date/time and current itim
0 1 1 90 8760.0 1
-------llyr,klowb = 6 4
co2 concentration is 3.4799999999999995E-004
rdsig lab 000b2 p sigma surface file n= 11
rdsig unit,fhour,idate= 11 0.0000000000000000 0
3 9 1990
number of tracers input = 1
number of cloud input = 0
rdsig gz z00= 329.2745436306341
rdsig q
rdsig te
rdsig di ze
rdsig rq
n1,itread,fhour after tread 11 0 0.0
input t=t0 full values
0div vort temp mixratio ln(ps) 0.85440144E-05 0.29136744E-04 0.25247229E+03 0.28508945E-02 0.45882381E+01
0.1238397114E-04 0.2204099817E-04 0.2888138897E+03 0.1161741630E-01
0.1215744913E-04 0.2349267774E-04 0.2880251284E+03 0.1121752612E-01
0.1162732756E-04 0.2504732324E-04 0.2869776967E+03 0.1074251095E-01
0.1057871503E-04 0.2595711210E-04 0.2856848077E+03 0.9859597806E-02
0.9075835834E-05 0.2584168528E-04 0.2841827167E+03 0.8569951030E-02
0.7587511711E-05 0.2476755789E-04 0.2824618701E+03 0.7492116218E-02
0.6626171057E-05 0.2371425427E-04 0.2806813982E+03 0.6400565791E-02
0.6512874369E-05 0.2335317677E-04 0.2784283916E+03 0.5462819446E-02
0.6794808790E-05 0.2363278408E-04 0.2756867530E+03 0.4596046369E-02
0.6892657240E-05 0.2406497907E-04 0.2724327121E+03 0.3725222207E-02
0.6853515540E-05 0.2515131087E-04 0.2683346805E+03 0.2877821560E-02
0.7187910596E-05 0.2779897989E-04 0.2635980815E+03 0.2234286054E-02
0.7409846957E-05 0.3168895088E-04 0.2577439552E+03 0.1609530220E-02
0.7991824166E-05 0.3664828580E-04 0.2510893702E+03 0.1090838298E-02
0.8888281333E-05 0.4163566053E-04 0.2437689619E+03 0.7402229396E-03
0.9607470158E-05 0.4404003882E-04 0.2357555520E+03 0.4783731284E-03
0.1015297215E-04 0.4241966109E-04 0.2284474071E+03 0.2531392236E-03
0.1089892464E-04 0.3724121506E-04 0.2220597300E+03 0.1047865826E-03
0.1067267332E-04 0.3102312609E-04 0.2162262618E+03 0.3456758205E-04
0.1079113490E-04 0.2533836083E-04 0.2114581829E+03 0.1170774166E-04
0.1000120317E-04 0.2114722876E-04 0.2071529513E+03 0.4986736203E-05
0.9230246229E-05 0.1705449550E-04 0.2068910773E+03 0.3137862543E-05
0.8838494866E-05 0.1482399686E-04 0.2099786732E+03 0.2386806010E-05
0.8880659046E-05 0.1520549559E-04 0.2141365663E+03 0.2457863029E-05
0.9136429626E-05 0.1767674144E-04 0.2183607314E+03 0.2530153449E-05
0.9169535142E-05 0.2180435179E-04 0.2236630677E+03 0.2478900717E-05
0.1008651591E-04 0.1991421991E-04 0.2324701195E+03 0.2214743552E-05
0.3835812659E-05 0.2009827500E-04 0.2539653894E+03 0.2019388695E-05
0div vort temp mixratio ln(ps) 0.85440144E-05 0.29136744E-04 0.25247229E+03 0.28508945E-02 0.45882381E+01
0.1238397114E-04 0.2204099817E-04 0.2888138897E+03 0.1161741630E-01
0.1215744913E-04 0.2349267774E-04 0.2880251284E+03 0.1121752612E-01
0.1162732756E-04 0.2504732324E-04 0.2869776967E+03 0.1074251095E-01
0.1057871503E-04 0.2595711210E-04 0.2856848077E+03 0.9859597806E-02
0.9075835834E-05 0.2584168528E-04 0.2841827167E+03 0.8569951030E-02
0.7587511711E-05 0.2476755789E-04 0.2824618701E+03 0.7492116218E-02
0.6626171057E-05 0.2371425427E-04 0.2806813982E+03 0.6400565791E-02
0.6512874369E-05 0.2335317677E-04 0.2784283916E+03 0.5462819446E-02
0.6794808790E-05 0.2363278408E-04 0.2756867530E+03 0.4596046369E-02
0.6892657240E-05 0.2406497907E-04 0.2724327121E+03 0.3725222207E-02
0.6853515540E-05 0.2515131087E-04 0.2683346805E+03 0.2877821560E-02
0.7187910596E-05 0.2779897989E-04 0.2635980815E+03 0.2234286054E-02
0.7409846957E-05 0.3168895088E-04 0.2577439552E+03 0.1609530220E-02
0.7991824166E-05 0.3664828580E-04 0.2510893702E+03 0.1090838298E-02
0.8888281333E-05 0.4163566053E-04 0.2437689619E+03 0.7402229396E-03
0.9607470158E-05 0.4404003882E-04 0.2357555520E+03 0.4783731284E-03
0.1015297215E-04 0.4241966109E-04 0.2284474071E+03 0.2531392236E-03
0.1089892464E-04 0.3724121506E-04 0.2220597300E+03 0.1047865826E-03
0.1067267332E-04 0.3102312609E-04 0.2162262618E+03 0.3456758205E-04
0.1079113490E-04 0.2533836083E-04 0.2114581829E+03 0.1170774166E-04
0.1000120317E-04 0.2114722876E-04 0.2071529513E+03 0.4986736203E-05
0.9230246229E-05 0.1705449550E-04 0.2068910773E+03 0.3137862543E-05
0.8838494866E-05 0.1482399686E-04 0.2099786732E+03 0.2386806010E-05
0.8880659046E-05 0.1520549559E-04 0.2141365663E+03 0.2457863029E-05
0.9136429626E-05 0.1767674144E-04 0.2183607314E+03 0.2530153449E-05
0.9169535142E-05 0.2180435179E-04 0.2236630677E+03 0.2478900717E-05
0.1008651591E-04 0.1991421991E-04 0.2324701195E+03 0.2214743552E-05
0.3835812659E-05 0.2009827500E-04 0.2539653894E+03 0.2019388695E-05
initial solhr = 0.0000000000000000
fixio field read in from unit= 11
fh, idate= 0.0 0 3 9 1990
fixio completed.
forward step: kdt in gsmstep= 1
* nnday of year = 68
0from heatl3 jdnmc etc 2447959 0.50 0.00 68.00 68.00
0 forecast date 9 mar. 1990 at 0 hrs 0.00 mins
julian day 2447959 plus 0.500000
radius vector 0.9928091
right ascension of sun 23.2742673 hrs, or 23 hrs 16 mins 27.4 secs
declination of the sun -4.6811514 degs, or -4 degs 40 mins 52.1 secs
equation of time -10.7192070 mins, or -643.15 secs, or-0.046899 radians
solar constant 2.0254750
1 ozoneKilled by signal 2.
I am sending the execution on 8 processors, the program on the frontend machine of the cluster doesn't abort, it remains idle (Killed by signal 2 is caused by me killing manually the process), while it aborts on all the machines in which it's ditributed.
As I was saying before I do not know if the problems is related to a difficulty or a mistake in distributing the calculations on the different processors or it's more simply due to a difficulty in reading the ozone or the aerosol files.
Thank you again for your time and your prompt answers.
First of all, thanks to all of you for your responces. I sadly am not working full time on this, so some days can pass without me responding.
Anyway, I tried each one of your suggestions but I am still stuck at the same point.
To answer to Hideki, I tried running this experiment on only one processor and the models runs correctly.
I tried deleting the "-lgm" option, but with or without them linked, nothing changes.
I also exported the variable P4_GLOBMEMSIZE on every machine of the cluster, but no improvement.
I also "unlimited the stacksize" on each machine of the cluster.
TO answer to the last suggestion, the ozone files are in a directory common to all the machines and are accessible to all the machines of the cluster.
I copied in the previous message the file fcstout.t00.
On every other machine (where the calculations are actually performed) that is not the frontend of the cluster I have a SEGMENTATION FAULT error.
I did have to change the FCSTENV in the "fcst" runscript, by adding to the "mpirun -np 8" command the "-nolocal" flag, that excludes the frontend from being used for calculations. I hope this isn't what causes the problem, but it shouldn't be.
Although at this point I'm really clueless of what the problem could be.
Thank you again to all of you, also for any other suggestion you might have.
Alessandro
Hi,
I am trying the gsm script but it bombed at fcst.x. I am running on a Intel Dualcore 32 bit Fedora Core 6 OS, compiled with intel fortran and cc compiler. The output from fcstout.ft00 is as follows. Appreciate your help and advice. Regards.
Hi - it is a bit difficult to figure out what is wrong with this limited information. I suspect a problem with rand() in setras.f . See the previous thread by alessandrocem and lonestar%20at%20TACC.html
. If it is not causing the error, I suggest recomipiling the code with DBG option on. See Debug.html
Please let me know when you find out more. Hideki
Thank you for your suggestions, but I still can't manage to run the 'gsm' script. It always aborts when executing sfc0.x.
Differently from last time I have downloaded and compiled using the command 'inst gsm_latest lo', which I think works correctly. I have the 5.1 version of the PG fortran compiler.
All my sfc0.out file says is this:
PGFIO-F-209/unformatted read/unit=19/'OLD' specified for file which does not exist.
In source file naopen.f, at line number 40
It seems the program isn't able to open even the first file, or othwerwise I would have at least a piece of the sfc0.out similar to the one you posted.
I browsed through the source files (starting from sfc0.f), but I wasn't able to find which subroutine calls 'naopen.f' and what input file it's referring to.
Thank you in advance for other ideas or suggestions.